Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning
نویسندگان
چکیده
Named entity recognition, and other information extraction tasks, frequently use linguistic features such as part of speech tags or chunkings. For languages where word boundaries are not readily identified in text, word segmentation is a key first step to generating features for an NER system. While using word boundary tags as features are helpful, the signals that aid in identifying these boundaries may provide richer information for an NER system. New state-of-the-art word segmentation systems use neural models to learn representations for predicting word boundaries. We show that these same representations, jointly trained with an NER system, yield significant improvements in NER for Chinese social media. In our experiments, jointly training NER and word segmentation with an LSTM-CRF model yields nearly 5% absolute improvement over previously published results.
منابع مشابه
Multi-task Multi-domain Representation Learning for Sequence Tagging
Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains. Traditional domain adaptation only considers adapting for one task. In this paper, we explore multi-task representation learning under the domain adaptation scenario. We propose a neural network framework that supports domain adaptation for mul...
متن کاملMulti-task Domain Adaptation for Sequence Tagging
Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains. Traditional domain adaptation only considers adapting for one task. In this paper, we explore multi-task representation learning under the domain adaptation scenario. We propose a neural network framework that supports domain adaptation for mul...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملCorrecting Word Segmentation and Part-of-speech Tagging Errors for Chinese Named Entity Recognition
In the exploration of Chinese named entity recognition for a specific domain, the authors found that the errors caused during word segmentation and part-ofspeech (POS) tagging have obstructed the improvement of the recognition performance. In order to further enhance recognition recall and precision, the authors propose an error correction approach for Chinese named entity recognition. In the e...
متن کاملUnsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition
This paper describes a novel character tagging approach to Chinese word segmentation and named entity recognition (NER) for our participation in Bakeoff-4.1 It integrates unsupervised segmentation and conditional random fields (CRFs) learning successfully, using similar character tags and feature templates for both word segmentation and NER. It ranks at the top in all closed tests of word segme...
متن کامل